A Topic Detection Approach Through Hierarchical Clustering on Concept Graph

نویسندگان

  • Xiaohui Huang
  • Xiaofeng Zhang
  • Yunming Ye
  • Shengchun Deng
  • Xutao Li
چکیده

Topic detection and tracking (TDT) algorithms have long been developed for the discovery of topics. However, most existing TDT algorithms suffer from paying less attention to: (1) temporal distance between a pair of topics; (2) the mutual effect between highly correlated topic terms. In this paper, we proposed a novel topic detection approach by applying hierarchical clustering on the constructed concept graph (HCCG), which is able to solve aforementioned shortcomings simultaneously. In this approach, the concept is first defined as well as the concept behavior curve. Then, the temporal graph is constructed with concept as vertexes and connected by the edges sharing the same topic terms. By performing hierarchical clustering on this concept graph, the highly correlated concept behavior curves will be grouped together as topics. The proposed approach is evaluated on a number of datasets and the promising experimental results show that our approach is superior to K-means, agglomerative hierarchical clustering algorithm(AGH), and LDA with respects to precision, recall and F-measure. Moreover, the proposed concept behavior curves can be used to track the topic change trend by monitoring on the peak frequency of the concept behavior curves.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

Graph-based Visual Saliency Model using Background Color

Visual saliency is a cognitive psychology concept that makes some stimuli of a scene stand out relative to their neighbors and attract our attention. Computing visual saliency is a topic of recent interest. Here, we propose a graph-based method for saliency detection, which contains three stages: pre-processing, initial saliency detection and final saliency detection. The initial saliency map i...

متن کامل

A k-core Decomposition Framework for Graph Clustering

Graph clustering or community detection constitutes an important task for investigating the internal structure of graphs, with a plethora of applications in several domains. Traditional techniques for graph clustering, such as spectral methods, typically suffer from high time and space complexity. In this article, we present CoreCluster, an efficient graph clustering framework based on the conc...

متن کامل

Analyzing Motorcycle Crash Pattern and Riders’ Fault Status at a National Level: A Case Study from Iran

Motorcycle crashes constitute a significant proportion of traffic accidents all over the world. The aim of this paper was to examine the motorcycle crash patterns and rider fault status across the provinces of Iran. For this purpose, 6638 motorcycle crashes occurred in Iran through 2009-2012 were used as the analysis data and a two-step clustering approach was adopted as the analysis framework....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013